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INFERENCE  ON  QUANTILE  REGRESSION  PROCESS, 
AN  ALTERNATIVE 

VICTOR  CHERNOZHUKOV 


Abstract.  A  very  simple  and  practical  resampling  test  is  offered  as  an  alternative  to 
inference  based  on  Kmaladzation,  as  developed  in  in  Koenker  and  Xiao  (2002a).  This 
alternative  has  competitive  or  better  power,  accurate  size,  and  does  not  require  estimation 
of  non-parametric  sparsity  and  score  functions.  It  applies  not  only  to  iid  but  also  time 
series  data.  Computational  experiments  and  an  empirical  example  that  re-examines  the 
effect  of  re-employment  bonus  on  the  unemployment  duration  support  this  approach. 


Key  Words:    bootstrap,  subsampling,  quantile  regression,  quantile  regression  process, 
Kolmogorov-Smirnov  test,  unemployment  duration 


1.  Introduction 

Inference  in  quantile  regression  models,  pioneered  by  the  classical  work  of  Koenker  and 
Bassett  (1978),  is  crucial  to  a  wide  range  of  economic  analyses.  For  example,  evaluation 
of  the  distributional  consequences  of  social  programs  requires  inference  concerning  nature, 
direction,  and  quantity  of  the  impact  throughout  the  entire  outcomes  distribution.  See  e.g. 
Abadie  (2002),  Buchinsky  (1994),  Heckman  and  Smith  (1997),  Gutenbrunner  and  Jureckova 
(1992),  McFadden  (1989),  Koenker  and  Xiao  (2002a),  and  Portnoy  (2001).  Just  like  in  the 
classical  p-sample  theory,  e.g.  Doksum  (1974)  and  Shorack  and  Wellner  (1986),  this  kind  of 
inference  is  based  on  the  empirical  quantile  regression  process.  It  differs  however  from  the 
early  approaches  by  replacing  the  basic  (indicator)  regressors  with  general  ones. 

The  main  difficulty  associated  with  such  inference  is  the  Durbin  problem  -  the  model's  fea- 
tures, estimated  nuisance  parameter,  or  non-i.i.d.  data  induce  parameter-dependent  asymp- 
totics,  jeopardizing  distribution-free  inference.1  In  a  recent  Econometrica  paper  Koenker 
and  Xiao  (2001)  proposed  an  ingenious  and  intricate  theory,  based  on  Khmaladze  transfor- 
mation, that  purges  the  tests  statistics  from  the  non-distribution-free  components,  restoring 
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distribution-free  inference.  The  approach  uses  recursive  projections  to  annihilate  the  pre- 
dictable component  in  the  process,  leaving  a  martingale  that  limits  to  a  standard  Brownian 
motion,  thus  overcoming  the  Durbin  problem. 

Here  we  suggest  a  simple  resampling  alternative  that  (i)  does  not  require  the  somewhat 
complex  Khmaladze  transformation,  (ii)  does  not  require  the  estimation  of  the  nonpara- 
metric  nuisance  and  score  functions,  (iii)  has  an  accurate  size  and  the  optimal  power  (in 
the  sense  that  its  has  same  power  as  the  test  with  known  critical  value2),  which  makes  it 
very  competitive  with  Khamaldzation,  (iv)  is  robust  to  dependent  data,  and  (v)  is  com- 
putationally and  practically  attractive.  Therefore,  the  approach  is  a  useful  complement  to 
Khmaladzation  and  is  aimed  at  substantively  expanding  the  scope  of  empirical  inference. 

The  basic  idea  is  extremely  simple.  The  key  statistic  is  based  on  the  quantile  regression 
process.  The  statistic  has  a  limit  distribution,  denoted  H,  under  the  null  hypothesis.  In 
order  to  estimate  H  correctly,  regardless  of  whether  the  null  is  true  or  not,  we  resampie 
an  appropriately  re-centered  quantile  process.  As  a  result,  H  as  well  as  the  entire  null 
law  of  the  process  are  correctly  estimated  under  local  departures  from  the  null.  For  re- 
sampling purposes,  we  choose  the  subsample  bootstrap,  cf.  Politis,  Romano,  and  Wolf 
(1999).  which  has  computational,  practical,  and  certain  theoretical  advantages  over  the 
usual  (unsmoothed)  bootstrap  for  quantile  regression,  cf.  Buchinsky  (1995)  and  Sakov  and 
Bickel  (2000).  Horowitz  (1992)'s  smoothed  bootstrap  may  also  be  an  attractive  resampling 
mechanism  for  these  tests. 

The  underlying  principle  differs  from  the  conventional  bootstrap  tests  for  goodness  of  fit. 
The  conventional  tests  resampie  from  a  probability  model  that  is  consistent  with  the  null, 
see  e.g  Romano  (1988),  Andrews  (1997),  and  Abadie  (2002).  Although  such  approach  is 
potentially  useful  in  quantile  regression  settings,  its  validity  remains  unknown,  because  the 
quantile  regression  families  estimated  in  emprical  work  are  typically  mis-specified  and  in- 
complete probability  models  (regression  quantile  lines  are  not  constrained  to  avoid  crossing, 
and  the  tail  regression  quantiles  are  not  estimated). 

In  what  follows,  we  use  P*  to  denote  (outer)  probability,  which  possibly  depends  on  n, 
id  =>  and  — >  to  denote  weak  convergence  in  a  space  of  boi 
in  distribution  for  random  vectors,  respectively,  under  P*. 


and  =>  and  — >  to  denote  weak  convergence  in  a  space  of  bounded  functions  and  convergence 


2.  The  Testing  Problem 

The  questions  posed  in  the  fundamental  econometric  and  statistical  literature  are  whether 
the  treatment  exhorts  a  pure  location  effect,  a  location-scale  effect,  or  a  general  shape 
effect,  cf.  Doksum  (1974),  Koenker  and  Machado  (1999),  Koenker  and  Xiao  (2002a).  or, 
for  example,  a  stochastic  dominance  effect,  cf.  Abadie  (2002),  Heckman  and  Smith  (1997), 
and  McFadden  (1989).  Quantile  regression  is  an  important  and  practical  tool  for  learning 
about  such  distributional  phenomena. 


Khmaladze  tests  do  not  generally  have  this  property,  see  Koenker  and  Xiao. 
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Suppose  Y  is  the  outcome  variable,  and  X  are  regressors.  Let  FYlx  and  Fy\X(r)  denote 
the  conditional  distribution  function  and  the  r-quantile  of  Y  given  X.  The  basic  conditional 
quantile  model  takes  the  linear  in  parameters  form: 

F-1A.(r)=X'/?n(r). 

for  all  t  6  T,  where  7  =  [e,  1  —  e]  are  quantiles  of  interest.  This  is  a  random  coefficient 
model  Y  —  X'f3n(U),  where  U  ~  f/(0, 1).  The  stated  model  allows  regressors  to  affect  the 
entire  shape  of  the  conditional  distribution  and  includes  the  classical  linear  location  and 
location-scale  models  as  special  cases.  To  facilitate  the  local  power  analysis,  parameter 
Pn(r)  is  made  dependent  on  the  sample  size  n. 

As  in  Koenker  and  Xiao  (2002a),  we  consider  the  following  null  hypothesis: 

#(t)/3„(t)-7-(t)  =  *(t).      reT,  (1) 

where  R{t)  denotes  a  q  x  p  matrix,  q  <  p  =  dim(/3),  r£l',  and  \£(t)  denotes  a  known 
function  ^  :  T  -»  R9.  We  assume  that  functions  R{t),  ^(t),  t(t),  and  /?o(T)  =  limn/?n(T) 
are  continuous  in  r. 

The  tests  will  be  based  on  the  Koenker-Bassett  quantile  regression  process  (3n(-): 

n 

pn{r)  =  arg  min     V  pT  (Y,  -  Xtf)  ,    r  €  T, 


(=1 

where  pT(u)  =  u(t  —  I(u  <  0)).  Other  estimators,  such  as  Chamberlain's  minimum  distance 
or  instrumental  variable  quantile  regression  estimators,  can  be  considered  as  well,  depending 
on  the  problem.  We  will  focus  on  the  basic  inference  process: 

«n(T)=(5(T)A,(T)-f(T)-*(T)))  (2) 

and  derived  from  it  Kolmogorov-Smirnov  and  Smirnov  statistics  Sn  =  f{\Znvn(-)), 

Sn  =  v/nsup||i;n(r)||v(T),      Sn  =  n  /  ||un(r)|||(r)dr,  (3) 

r€T  J  J 


where  ||a||v  =  Va'Va,  the  symmetric  V(r)  — ^  V(t)  uniformly  in  r,  and  V(t)  is  a  positive 
definite  symmetric  matrix  uniformly  in  r.  The  choice  of  V  and  V  is  discussed  in  section  3. 

Example  1  (Location  Hypothesis).  An  important  hypothesis  is  that  of  the  classical 
location-shift  regression 

F^x(T)=X'a  +  1-F-i{r),       or      Y  =  X'a  +  7  •  V, 

where  V  is  independent  of  X.  In  this  case,  R(t)  =  R  =  [0  :  /p-i]  and  r(r)  =  r  = 
(a2,  ...,ap)',  which  asserts  that  the  quantile  regression  slopes  are  constant,  independent  of 
t.  The  component  r  can  be  estimated  by  any  method  consistent  with  the  null,  for  example 
LAD,  OLS,  and  others  in  Koenker  and  Xiao  (2002a). 
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Example  2  (Location-scale  Hypothesis).  The  location- scale  shift  regression  constraints 
X  to  affect  only  the  location  and  scale  of  Y,  but  not  any  other  moments: 

F-^{T)  =  X'a  +  X'j-F-l(T),       or      Y  =  X'a  +  X'-y  ■  V, 

where  V  is  independent  of  X.  In  this  case,  r(r)  =  a  +  7  •  Fy  (t),  R  =  [0  :  Ip-i],  and 
\I/(t)  =  0.  Estimates  of  a  and  7  •  Fv71(t)  can  be  obtained  by  using  OLS  projection  of  each 
component  of  slopes  vector  /0_i(-)  on  the  intercept  /?i(-),  see  Koenker  and  Xiao  (2002a). 

Example  3  (Stochastic  Dominance  Hypothesis).  Suppose  D  =  1  denotes  receipt  and 
D  =  0  denotes  non-receipt  of  a  treatment.  The  test  of  stochastic  dominance,  or  whether 
the  treatment  is  unambigiously  beneficial,  in  the  model 

F^x(T)  =  D5(T)  +  X'd(T), 

involves  the  dominance  null 

5(t)  >  0,  for  all  t  G  1 
versus  the  non-dominance  alternative 

5(t)  <  0,  for  some  r  €  T. 

In  this  case,  the  least  favorable  null  involves  t(t)  =  0,  R  —  —[1,0...],  ^(r)  =  0.  and 
P(t)  =  (5(t),0(t)')',  and  one  may  use  the  one  one-sided  Kolmogorov-Smirnov  or  Smirnov 
statistics  Sn  =  v/ninfrSTmax(  —  5n(r),0)  and  Sn  =  y/n  J  ||  max(  —  5n(r),  0)\\y{r)dr  to  test 
the  hypothesis. 

We  will  maintain  the  following  assumptions. 

A.l  (Yt,Xt,t  <  n)  is  stationary  and  strongly  mixing  on  probability  space  (fi.3",  Pn)- 

A. 2  Law  of  (Yt,Xt,t  <  n),  P[™\  is  contigious  to  some  P'"l,3  and  either 

(a)  for  a  fixed  continuous  function  p(r)  :  7  — >  W  and  for  each  n 

R(t)Pu(t)  -  r(r)  =  *(t)  +  g(r),    g(r)  =  p(r)/y/n,  or. 

(b)  for  a  fixed  continuous  function  g(r)  :  7  — >  Rq  and  for  each  n 

R(r)p(T)-r(r)  =  ^(T)+g(T). 

A. 3  (a)   Under  any  local  alternative,   A2(a),    y/n  (J3n{-)  -  Pn{-))    =>    &(•)•    \/"(-R(')  _ 
R(.))  =>  p(.),  v/n(f(-)  -r(-))  =>  -?(•)>  jointly  in  ^(T),  where  (6,p,?)  are  jointly 
zero  mean  Gaussian  functions  with  nondegenerate  covariance  kernel, 
(b)  Under  the  global  alternative,  A2(b),  the  same  holds,  except  that  the  limit  (b,  p,  <;) 
needs  not  have  the  same  distribution  as  in  A3  (a). 


As  defined  e.g.  on  p.  87  in  van  der  Vaart  (1998) 
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A.l  allows  a  wide  variety  of  data  processes:  iid,  time  series,  and  panels.  Mixing  is 
sufficient  but  is  not  necessary  for  consistency  of  subsampling.  Stationarity  can  be  replaced 
by  more  general  stability  conditions,  see  ch.  4  in  Politis,  Romano,  and  Wolf  (1999).  A. 2(a) 
and  A. 2(b)  formulate  a  local  and  a  global  alternative.  A. 3  is  very  general  condition,  that 
is  implied  by  a  wide  variety  of  conditions  in  the  literature,  most  remarkable  and  general  of 
which  are  given  in  Portnoy  (1991),  who  allows  shape  heteroscedasticity  and  dependent  data. 
Thus  A. 3.  substantively  generalizes  Koenker  and  Xiao's  (2001)  or  Koenker  and  Machado's 
(1999)  conditions  (local-to-location-scale  assumption  and  iid  sampling)  which  elegantly  suit 
the  hypotheses  in  Examples  1  and  2,  but  are  restrictive  and  not  necessary  in  Example  3. 

Proposition  1.   1.   Under  conditions  Al,  A2a,  A3,  in  £°°(7) 

Vnvn{-)  =>«(■)  =u(-)  +  d{-)+p(-), 
MO 

where  u(t)  =  i?(r)'6(r)  and  d(r)  =  {Po{T)p{T)  +  ?(r))  •  Under  the  null,  p  =  0, 

sn  =»  s  =  /M-)). 

2.   Under  Al,  A2b,  A3,  y/n(vn{-)  -  g{-))  =>  v(-)  =  u(-)  +  d(-).  where  u{r)  =  Z?(t)'6(t)  and 

d{T)  =  {(3o{t)p{t)  +  <f(r)).  And  Sn  — ^  oo  if  /(%/"<?(•)  +  Op{\))  — ^  oo  (which  is  true  for 
statistics  in  (3)  once  g  ^  0/ 

The  limit  consists  of  three  components  that  illustrate  the  Durbin  problem: 

1.  The  usual  component  u  is  typically  a  Gaussian  process  with  non-standard  covariance 
kernel,  so  its  distribution  can  not  be  feasibly  simulated.  This  problem  may  be  assumed 
away  by  imposing  iid  conditions.  However,  the  problem  does  not  go  away,  once  the  data  is 
a  time  series  or  a  panel.  In  such  setting,  Koenker  and  Xiao's  method  unfortunately  does 
not  apply  in  its  present  form. 

2.  Component  d  is  the  Durbin  component  that  is  present  because  R  and  r  are  estimated. 
Koenker  and  Xiao  isolate  d  as  a  chief  problem  that  makes  the  entire  term  v  to  have  a 
nonstandard  covariance  kernel.  They  use  Khmaladzation  to  annihilate  this  component. 

3.  Component  p,  which  describes  deviations  from  the  null,  determines  the  test's  power. 
As  Koenker  and  Xiao  show,  the  Khmaladzation  inadvertently  removes  some  portion  of  this 
component  as  well.  In  fact  they  gave  examples  where  p  is  removed  completely,  such  as  piece- 
wise  linear  densities.  Such  densities  are  not  commonplace,  yet  they  can  easily  approximate 
a  well  behaved  density.  Nevertheless,  Koenker  and  Xiao  show  that  Khmaladzation  has 
respectable  power  in  most  practical  cases. 

Khmaladzation  requires  estimation  of  several  nonparametric  nuisance  functions  -  the 
sparsity  functions  and  various  score  functions,  see  Koenker  and  Xiao  for  details.  The 
feasibility  of  this  may  depend  on  the  underlying  model.  Under  a  location-scale  shift  model, 
the  procedure  is  not  laborious.  Otherwise,  for  instance  in  Example  3,  estimation  of  score 
functions  is  more  difficult  and  has  not  been  implemented  nor  had  its  theory  been  established. 
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In  the  next  section,  we  describe  a  simple  approach  that  is  very  useful  in  practice,  does 
not  erase  components  of  p  under  any  circumstances,  and  does  not  require  nuisance  function 
estimation.  From  a  constructive  point  of  view,  the  approach  is  not  intended  to  be  a  critique 
of  the  Koenker  and  Xiao  methods,  which  are  brilliant  and  useful  in  many  conceivable  cases. 
Rather,  the  approach  is  meant  to  be  a  useful  complement,  aimed  at  substantively  expanding 
the  scope  of  empirical  inference. 

3.  Resampling  Test  and  Its  Implementation 

3.1.  The  Test.  The  approach  is  based  on  the  mimicking  process  v  and  statistic  Sn: 

vn{r)  =  vn{r)  -g{r),     Sn  =  f{vn(-)). 

Proposition  2.  1.  Given  A.l,  A2a,  A. 3  vn{-)  =>  v0(-),  §n  =>  S.  2.  Given  A.l,  A2b, 
A. 3     vn(T)^v(-)=u(-)+d(-),      Sn=^S  =  /(«(•))■ 

Under  local  alternatives,  the  statistic  Sn  correctly  mimics  the  null  behavior  of  Sn,  even 
when  the  null  is  false.  This  does  not  happen  under  global  alternatives,  but  this  is  not 
important. 

In  what  follows  we  use  vn(r)  itself  to  estimate  g(r),  and  use  the  subsample  bootstrap  to 
consistently  estimate  the  distribution  of  S,  which  equals  that  of  S  under  the  null  hypothesis. 
The  usual  bootsrap  will  also  work,  but  subsampling  is  preferable  on  both  computational 
grounds  explained  in  Buchinsky  (1995)  and  theoretical  reasons  given  in  Sakov  and  Bickel 
(2000). 4 

The  basic  idea  of  the  subsample  bootstrap,  introduced  by  Politis,  Romano,  and  Wolf 
(1999),  is  to  approximate  the  sampling  distribution  of  a  statistic  based  on  the  values  of  this 
statistic  computed  over  smaller  subsets  of  data.  The  resampling  tests  based  on  subsampling 
are  done  in  three  steps. 

Step  1.  For  cases  when  Wt  =  (Yt,Xt)  is  iid,  construct  all  subsets  of  size  6.  The  number  of 
such  subsets  Bn  is  "n  choose  6."  For  cases  when  {Wt}  is  a  time  series,  construct  Bn  =  n— 6+1 
subsets  of  size  b  of  the  form  {Wi, ...,  Wi+b-i}-  Compute  the  inference  process  v^n,i(')^  f°r 
each  i-th  subset,  i  <  Bn.5 

Denote  by  vn  the  inference  process  computed  over  the  entire  sample;  and  by  i>f>in>i  the 
inference  process  computed  over  the  i-th  subset  of  data: 


Another  attractive  alternative  is  the  smoothed  bootstrap  as  in  Horowitz  (2001).  Subsampling  is  used  for 
pragmatic,  computational  reasons.  In  addition,  Sakov  and  Bickel  (2000)  show  that  subsampling,  combined 
with  interpolation,  yields  the  same  minimax  order  of  occupancy  as  smoothing  once  subsample  size  b  oc  n-2'5. 

A  smaller  number  Bn  of  randomly  chosen  subsets  can  also  be  used,  if  Bn  —>  oo  as  n  — >  oo,  cf.  Section 
2.5  in  Politis,  Romano,  and  Wolf  (1999). 
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and  define  §nj67,  =  f{vb[vt,,n,i{')  —  ^n(-)])<  f°r  instance 

S„i6i2  =  supv^lK^r)  -  un(r)||v(r)  or  Snbi  =b       \\vnb,i{j)  -  un(r)|||,,dT, 

for  cases  when  Sn  is  Kolomogorov-Smirnov  or  Smirnov  statistics,  respectively.  Define 
G{x)  =  Pr{S<x}     and     H{x)  =  Pr{S  <  x}. 

As  6/n  ->  0  and  b  ->  oo,   V^ll^nO)  -  ff(-)ll   =  v^  x  Op(l/y/n)  -^4  0,  including  when 
ff(r)=p(r)/>/fi.  Therefore  y/b\\vn,bii{-)  -  <?(■)  +  (g(-)  -  vn(-))\\  =  Vb\\vnAi{-)  -  g(-)  +  op(l)\\, 
uniformly  in  i.  Therefore,  the  distribution  of  §„,{,,,  (r,  g)  can  consistently  estimate  G,  which 
coincides  with  H  under  local  alternatives.  Thus,  the  following  steps  are  clear. 
Step  2.  Estimate  G[x)  by 

B„ 
Gn,b(x)  =  S-1^l{Sn,(M(r)<x}. 

!=I 

Step  3.  The  critical  value  is  obtained  as  the  1  —  a-th  quantile  of  Gn<b(-):6 

cnJb{l-a)  =  G-^(l-a). 

The  size  a  test  rejects  the  null  hypothesis  when  Sn  >  cn>(,(l  —  a). 
Theorem  1.   Given  A.l  -  A. 3  as  b/n  — >  0,  b  — >  oo,n  — »  oo,  Bn  — >  oo, 

(i)    When  the  null  is  true,  p  =  0.  if  H  is  continuous  at  i/_1(l  —  a): 

c„,6(l  -  a)  -^  //"'(I  -  a),      Pn(5n  >  cn,6(l  -  a))  ->  a. 
(ii)    Under  local  alternative  A2a,  p  ^  0,  (f  //  is  continuous  at  H~l(l  —  a): 
c„,6(l  -a)  -^>  //"'(l  -  a),      Pn(Sn  >  ci6(l  -  a))  -+  /?, 
where  (3  =  Pr{f{v0{-)  +  p{-))  >  //"'(I  -a)), 
(iii)    Under  global  alternative  A2b,  if  G  is  continuous  at  G_1(l  —  a),  and  Sn  — ^  oo: 

c„,6(l  -  a)  -^  G-a(l  -  a),      Pn(Sn  >  c„,6(l  -  a))  -*  1. 

(iv)  //(x)  and  G(x)  are  absolutely  continuous  at  x  >  0  wften  i/ie  covariance  function  of 
v  and  v  is  nondegenerate. 

Thus  the  resampling  test  is  asymptotically  unbiased  and  has  the  same  power  as  the 
corresponding  test  that  uses  a  known  critical  value.  Furthermore  if  as  p(r)  — >  oo  or  — oo, 
f(vo(')  +p())  —*  °°-  the  power  j3  goes  to  one.  Under  global  alternatives,  the  estimated 
critical  values  are  Op{\)  and  Sn  — ^  oo  for  Kolmogorov-Smirnov  and  Smirnov  statistics. 


In  practice  it  may  be  useful  to  account  for  an  error  in  G~\(l  —  a)  caused  by  Bn  being  "small".  In 
simulations,  we  added  1.69  times  an  estimate  of  standard  error  to  G~\(l  —  a),  to  make  the  test  more 
conservative  when  B„  is  small. 
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3.2.  Approximations.  It  may  be  sometime  more  practical  to  use  a  grid  7n  in  place  of  7 

with  the  largest  cell  size  5n  —>  0  as  n  — >  oo. 

Corollary  1.  Propositions  1  and  2  and  Theorems  1  and  2  are  valid  for  piece-wise  constant 
approximations  of  the  finite-sample  processes,  given  that  5n  — ►  0  as  n  — »  oo. 

3.3.  Estimation  of  V(r).  In  order  to  increase  the  test's  power  we  could  set 

V(t)  =V*{t)  =Var[u(r)]_1, 

which  is  a  (generalized)  Andersen-Darling  weight.  In  iid  samples,  there  are  many  methods 
for  estimating  V*(r),  uniformly  consistently  in  t.  We  are  not  aware  of  any  results  for  more 
general  cases.  Note  that  we  would  need  a  Newey  and  West  (1987)  type  estimator  that  is 
consistent  uniformly  in  t. 

Subsampling  itself  can  be  used  to  estimate  V(t),  even  without  assuming  asymptotic 
integrability  conditions.  This  is  possible  by  using  a  percentile  method  in  conjunction  with 
asymptotic  normality. 

Consider  the  truncated  variance 

Vk(t)  =  Var[wK(r)]_1,  where  vk{t)  =  v{t)  x  Ikm(v{t)), 

K(t)  =  *-p1=i[Lj(t),Uj(t)]  is  a  large  compact  set.  E.g.,  L3  =  a-quantile  of  vj(t),  and 
Uj  =  1  —  a-quantile  of  Vj(t).  We  can  estimate  V^(r)  using  Theorem  2  stated  below. 
Note  that  having  estimated  the  truncated  variance,  we  may  stop  there  since  for  a  large  K, 
V£(r)  «  V*(t),  and  Theorem  1  applies  to  any  positive  definite  symmetric  V(t). 

Second,  using  that  v(t)  =  N(0,  V*(r)),  we  can  use  the  percentile  method  to  obtain  an 
estimate  of  diagonal  elements  of  V*(r)  based  on  V^-(t).  Using  symmetrically  trimmed 
correlations,  we  can  then  estimate  off-diagonal  elements.  In  simulations  we  simply  used 
un-truncated  variances. 

Theorem  2  provides  the  uniformly  consistent  in  t  estimates  of  any  truncated  moments  of 
the  process  vn(r),  including  the  trimmed  correlations.  This  theorem  is  a  direct  consequence 
of  the  ingenious  results  of  Politis,  Romano,  and  Wolf  (1999). 

Let  r  i— >  v(t)  be  an  element  of  £°°(T),  equipped  with  the  sup  norm,  and  L(c,  k)  be  a  class 
of  measurable  Lipshitz  functions  <p  :  £°°(T)  — >  M.K  that  satisfy: 

\\<p(v)  -  <p(v')\\  <  c  •  sup  ||i,(t)  -  v'(t)1      \\ip(v)\\  <  k, 

T 

where  c  and  k  are  suitably  chosen  positive  constants.  For  probability  laws  Q  and  Q',  define 
the  bounded  Lipshitz  metric  (which  metrizes  weak  convergence)  as 

Pbl{Q,Q')  =  sup\\EQif-  EQnp\\. 

Useful  examples  of  ip  include  (p(v)  =  ■w(t)to/a-('D(t)),  where  (v\,  ...,vp)m  =  v™1  x  ...  x 
vp  p    and  we  replace  the  indicator  1K{t)(v(t))  by  a  smooth  approximation  /A(-)(u(t))  which 
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vanishes  outside  compact  set  K,  for  all  r.  This  defines  all  kinds  of  truncated  moments  and 
correlations.  For  example,  if  v(t)  is  a  scalar  (for  clarity  sake),  then 

Vax[vK(r)]  =  ELnb  [v{r)2fK :(v(t))}  =  B~l  ^[(^(t)  -  vn(r))2  fK(vnAl(r)  -  vn(r))] . 

2  =  1 

where  Ln^  denotes  the  subsampling  (outer)  law  of  vn^j ;(•)  —  vn{-)  in  £°°(7). 

Theorem  2.    Under  assumption  A.1-A.3,  letting  L  and  Lq  denote  the  laws  ofv(-)  and  vq(-) 
in  £oc(7),  respectively, 

p~ 

pBL{Ln,b'L)   -2»  0, 

and  L  equals  Lq  under  local  alternatives.  In  particular,  for  functions  (v  >->■  v(T)mfK{r)(v(T)), 
t  £  7)  within  Ti(c,k), 


sup 

7£T 


EL^[v(r)mfKav(r))}     -  EL[v(r)mfK{r)(v{r))] 
The  last  statement  remains  true  even  when  /K(r)(-)  is  replaced  by  l/r(T)( 


^0. 


3.4.  Choice  of  Block  Size.  In  Sakov  and  Bickel  (1999)  and  in  Politis,  Romano,  and  Wolf 
(1999)  various  rules  are  suggested  for  choosing  appropriate  subsample  size.  Politis,  Romano, 
and  Wolf  (1999)  focus  on  the  calibration  and  minimum  volatility  methods.  The  calibration 
method  involves  picking  the  optimal  block  size  and  appropriate  critical  values  on  the  basis  of 
simulation  experiments  conducted  with  a  model  that  approximates  a  situation  at  hand.  The 
minimum  volatility  method  involves  picking  (or  combining)  among  the  block  sizes  that  yield 
more  stable  critical  values.  More  detailed  suggestions  emerge  from  Sakov  and  Bickel  (1999) 
and  Buchinsky  (1995).  Sakov  and  Bickel  (2000)  suggest  that  choosing  b  =  hi2'5  yields7 
the  optimal  minimax  accuracy  (in  conjunction  with  extrapolation).  Our  own  experiments 
indicated  that  the  constant  k  between  3  and  10  are  attractive  both  computationally  and 
qualitatively,  which  well  accorded  with  the  results  of  Sakov  and  Bickel  (1999)  for  the  sample 
median. 


4.  A  Computational  Example 

The  computational  experiment  that  we  consider  is  that  of  Koenker  and  Xiao  (2002a). 
This  allows  us  to  compare  the  performance  of  the  resampling  test  vs.  Khamaladzation 
without  prejudicing  against  the  latter.  Consider  the  location-shift  hypothesis  as  in  Example 


Their  result  is  for  the  subsample  bootstrap  with  replacement  However,  the  replacement  and  non- 
replacement  versions  are  asymptotically  equivalent  once  b2 /n  — >  0.  See  e.g.  Politis,  Romano,  and  Wolf 
(1999). 
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1.  The  data  is  generated  from  the  model: 

Yi  =  OL  +  pXi  +  cT{Xi)-eu 

o{Xi)  =70+7!    X{, 

d  ~N{Q,1),  Xi  ~iV(0,l), 

a  =  0,0  =  l,7o  =  l- 

Under  the  null  hypothesis  71  =  0.  We  examine  the  empirical  rejection  probabilities  for  the 
test  for  different  choices  of  sample  sizes  and  heteroscedasticity  parameter  71 .  In  constructing 
the  test,  we  used  the  OLS  estimate  of  /3  and  7  =  [.05,  .95].  When  71  =  0  the  model  is  a 
location-shift  model,  and  the  rejection  rates  yield  the  empirical  sizes.  When  71  7^  0  the 
model  is  a  heteroscedastic  model,  and  the  rejection  rates  give  the  empirical  powers.  Table 
1  reports  the  results  and  compares  them  with  Khmaladzation.  Other  details  of  the  set  up 
are  as  those  reported  in  Koenker  and  Xiao  (2002b). 

Table  2  speaks  for  itself.  The  resampling  test  is  powerful  and  accurate  even  in  small  sam- 
ples. From  these  results,  it  is  fair  to  say  that  the  resampling  test  emerges  as  a  respectable, 
serious  complement  to  the  Khamaladzation  method.  The  method  is  also  quite  robust  to 
variation  of  subsample  size  -  a  wide  variety  of  subsample  sizes  performs  very  well  (including 
when  the  resampling  mechanism  is  the  n  out  of  n  bootstrap),  suggesting  that  even  fairly 
small  sub-samples  (6  =  5n2'5)  are  both  computationally  and  qualitatively  attractive. 

5.  An  Empirical  Application 

To  illustrate  the  present  approach,  we  will  re-analyze  and  expand  on  the  main  empirical 
question  considered  in  Koenker  and  Xiao  (2002a).  The  question  concerns  the  Pennsylvania 
re-employment  bonus  experiment  conducted  by  the  U.S.  Department  of  Labor,8  which  was 
conducted  in  the  1980's  in  order  to  test  the  incentive  effects  of  an  alternative  compensation 
scheme  for  unemployment  insurance  (UI).  In  these  controlled  experiments,  UI  claimants 
were  randomly  offered  a  cash  bonus  if  they  found  a  job  within  some  prespecified  of  time 
and  if  the  job  was  retained  for  a  specified  duration.  The  goal  was  to  evaluate  the  impact  of 
such  a  scheme  on  the  unemployment  duration. 

As  in  Koenker  and  Xiao  (2002a)  we  focus  on  the  compensation  schedule  that  includes  a 
lump-sum  payment  of  six  times  the  weekly  unemployment  benefit  for  claimants  establishing 
the  reemployment  within  12  weeks  (in  addition  to  the  usual  weekly  benefits).  The  definition 
of  unemployment  spell  includes  one  waiting  week,  with  the  maximum  of  uninterrupted  full 
weekly  benefits  of  27. 

The  model  under  consideration  is  the  linear  conditional  quantile  model  for  the  logarithm 
of  duration: 

QW)(T\X)  =  a(r)  +  5(t)  ■  D  +  X'P(r), 

There  is  a  significant  empirical  literature  focusing  on  the  analysis  of  this  and  other  similar  experiments, 
see  e.g.  Meyer  (1995)'s  review. 
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where  T  is  the  duration  of  unemployment,  D  is  the  indicator  of  the  bonus  offer,  and  X 
is  a  set  of  socio-demographic  characteristics  (age,  gender,  number  of  dependents,  location 
within  the  state,  existence  of  recall  expectations,  and  type  of  occupation).  Further  details 
are  given  in  Koenker  and  Bilias  (2001).  The  estimate  of  <5(-)  is  plotted  in  Figure  1. 


Quantile  Treatment  Effect  for  Unemployment  Duration 


0.4 

Quantile  Index 


Figure  1 


The  three  basic  hypotheses  described  in  table  3  include: 

•  treatment  effect  is  constant  across  most  of  the  distribution  (  T  =  [.15,  .85]), 

•  treatment  affects  only  the  location  and  scale  of  the  outcome  log(T), 

•  treatment  effect  is  unambiguously  beneficial:  5(t)  <  0  for  all  r  G  1. 

These  hypotheses  specialize  Examples  1-3  to  the  present  case.  The  resampling  test  is 
implemented  following  section  3,  and  the  results  are  given  in  Table  3.  The  tests  were 
implemented  for  subsample  size  of  3000.  We  could  not  consider  subsamples  of  smaller  sizes 
because  they  often  yielded  singular  designs  (many  components  of  X  are  dummy  variables 
taking  on  positive  value  with  probability  2  —  10%).  Thus,  we  dealt  with  effectively  a  small 
sample  despite  that  n  =  6384  (see  Goldberger  (1991)  on  characterizing  close-to-singular 
designs  as,  effectively,  the  small-sample  designs). 

The  first  two  hypotheses  are  decisively  rejected,  strongly  supporting  the  conclusion  of 
Koenker  and  Xiao  (2002a).  This  is  notable  given  that  the  resampling  tests  provide  accurate 
and  powerful  inferences  even  in  effectively  small  samples,  as  we  saw  in  Table  2.  The  rejec- 
tion of  these  hypotheses  is  an  additional  strong  evidence  in  favor  of  the  quantile  inference 
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paradigm  of  Lehmann-Doskum,  which  emphasizes  ubiquity  of  quantile  shift  effects  and  the 
general  impossibility  of  describing  such  treatment  effects  as  merely  shifting  the  location  and 
scale. 

The  hypothesis  of  stochastic  dominance,  the  third  one,  is  decisively  supported.  The  test 
statistic  is  0,  while  for  rejection  it  is  necessary  that  it  exceeds  the  value  of  2.6.  This 
additional  result  complements  the  set  of  inferences  given  in  Koenker  and  Xiao  (2002a). 
Thus  the  bonus  offer  creates  a  first  order  stochastic  dominance  effect  on  the  unemployment 
duration,  supporting  the  efficacy  of  the  program. 

6.  Conclusion 

A  simple  and  practical  resampling  test  is  offered  as  an  alternative  to  the  Khmaladzation 
technique,  suggested  in  Koenker  and  Xiao  (2002a).  This  alternative  has  optimal  power 
(same  power  as  the  test  with  known  critical  value)  and  does  not  require  estimation  of  non- 
parametric  nuisance  functions.  It  applies  both  to  iid  and  time  series  data.  Finite-sample 
experiments  provide  a  strong  evidence  in  favor  of  this  technique  and  an  empirical  illustration 
illustrates  its  utility. 

Appendix  A 

Proof  of  Proposition  1  and  2  The  result  is  immediate  from  A.2-A.3.  ■ 

Proof  of  Theorem  1  Part  II  of  the  proof  follows  standard  arguments  for  subsampling  consistency,  as 
in  Politis,  Romano,  and  Wolf  (1999).  There  are  few  details  that  we  have  to  fill  out  before  then.  We 
give  the  proof  for  the  Kolmogorov-Smirnov  statistic.  Extensions  to  other  statistics  defined  in  the  text  are 
straightforward. 

I. To  prove  (i)-  (iii),  define  Gn,b(x)  and  write  out  Gn,b(x) 

GnAx)  =  B~l   J^  1  [sup  I V1/2(t)  (V6K,6,i(r)  -  g(r))  +  -fb(g(r)  -  vn(r)))  |  <  «], 


Gn,b(x)  =  B-1l[  ]T  sup|v1/2(r)(\/fc(^,b.,(r)-s(r)))|< 


EGnAx)  =  Pn(Sb  <  x).  For  iid  case:  by  G„AX)  being  a  U-statistic  of  degree  b;  and  otherwise:  by  LLN 
in  Politis,  Romano,  and  Wolf  (1999),  Theorem  3.2.1,  combined  with  contiguity,  conclude  Gn,b(x)  — ^  G(x). 
Next  collect  two  facts:  fact  1,  uniformly  in  i 


Vl'2(r)  (Vb(vn,bAr)  -  g(r))  +  Vb(g(r)  -  v„(r))) 


K  -    |vi/*(t)  (Vb(vn,bAr)  -  g(r))  +  y/b(g(r)  -  «„(r))) 


<  n/aT, 


The  paradigm  is  formulated  in  a  series  of  works  by  Lehmann  (1974),  Doksum  (1974),  Koenker  and 
Machado  (1999),  and  Koenker  and  Bilias  (2001). 
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where  An  =  sup,  maxeig   (v'-1/2(r)i>(r)y-1/2(r)V  and  A„  =  supr   maxeig   (v'"1/2(r)V(r)V>-1/2(r))  by 
eq-ty  10  on  p.460  in  Amemiya  (1985).10  Fact  2 follows  from  Fact  1  and  by  ||A||-|NI  <  ||A  +  -uj[|  <  ||.4||  +  |MI> 

l[Ai   <   (x/un  -  Wn)]  <  l[A,   <  x]  <  l[Ai   <  {l/ln  +  UI„)] 


where  ln  =  v/l/An  and  un  =  v  A„,  and  wn  is  defined  below. 

By  A2  and  A3  and  assumptions  on  V  and  V  wn  =  sup_  y/b  Vrl/2(r)(un(r)  —  p(r))  =  Op(\/b/\/T)  -^ 
0,     qn  =  max[|u„  —  1|,  |/„  —  1|]  — ^  0.  Thus  wp— >  1  l{En)  =  1,  where  E„  =  {vn,qn  <  S}  for  any  5  >  0. 

II.  Thus  for  small  enough  e  >  0  there  is  S  >  0,  so  that  by  fact  2:  Gn,b(x  -  e)l{En)  <  Gn,b{x)l{En)  < 
Gn,b(x  +  e)l(En)  so  that  with  probability  tending  to  one:  Gn,b(x  —  e)  <  Gn,i,(x)  <  Gn,b(x  +  e).  Now  pick 
e  >  0  so  that  [x  —  e,x  +  e]  are  continuity  points  of  G(x).  For  such  small  enough  t ,  Gn,b(x  +  c)  — ^  G(x  —  c), 
for  c  =  e  and  c  =  — e,  which  implies  G{x  —  e)  —  e  <  Gntb{x)  <  G(x  +  e)  +  e  w.p.  — >  1.  Since  e  and  <5(e) 
can  be  set  as  small  as  we  like,  Gn,b{x)  — ^  G(x).  Now  note  that  x  =  G-1(l  —  a)  is  a  continuity  point  by 
assumption.  Convergence  of  quantiles  is  implied  by  the  convergence  of  distribution  functions  at  continuity 
points. 

III.  (iv)  follows  from  Lifshits  (1982)  or  Davydov,  Lifshits,  and  Smorodina  (1998)  by  A3.  ■ 

Proof  of  Theorem  2  I.  The  proof  of  the  first  statement  is  a  direct  corollary  of  Theorem  7.3.1  in  Politis, 
Romano,  and  Wolf  (1999).  II.  The  class  of  moment  functions  /  defined  prior  to  the  proof  is  clearly  Borel 
measurable,  since  these  functions  are  a  measurable  function  of  a  random  vector  v„(r).  The  convergence  of 
subsampling  truncated  moments  thus  follows  from  the  definition  of  pi,.  III.  Finally,  because  v(t)  is  non- 
degenerate  uniformly  in  r,  it  has  continuous  bounded  density  by  Gaussianity,  by  A. 3,  Ev(t)p1k^^(v(t)) 
can  be  approximated  arbitrarily  well  by  Ev{r)p fK(-)(v{r))  for  some  v  >->  f(T)p/K(-)  6  L(c',k')  for  some  c', 
/.'    ■ 
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Table  1.  Empirical  Rejection  Results  for  5%  level  Khmaladze  Test 


H 

=  .5  Bofmger 

H=.6  Bofmger 

H= Bofmger 

Size 

Power 

Size 

Power 

Size 

Power 

7  =  0 

7  =  .2 

7i  =  -5 

7  =  0 

7  =  .2 

7i  =  -5 

7  =  0 

7=2 

7i  =  -5 

n=  100 

0.101 

0.264 

0.898 

0.035 

0.211 

0.755 

0.016 

0.126 

0.641 

n-  200 

0.070 

0.480 

0.988 

0.041 

0.406 

0.990 

0.022 

0.280 

0.964 

72  =  300 

0.062 

0.622 

0.998 

0.043 

0.665 

1.000 

0.029 

0.416 

0.998 

n  =  400 

0.043 

0.809 

1.000 

0.043 

0.809 

1.000 

0.035 

0.632 

1.000 

Notes:  All  results  are  from  Koenker  and  Xiao  (2002b).  Symbol  H  denotes  different  bandwidth  choices 

relative  to  the  Bofmger  rule. 


TABLE  2.  Empirical  rejection  results  for  5%  resampling  test  (Smirnov  Statis- 
tic), for  various  K,  b  =  K  x  n2/5,  using  250  bootstrap  draws  and  500  Repe- 
titions. 


Subsampling  ' 

rest  (K=5) 

Subsampling  Test  (K 

=  10) 

Boot 

strap  Test  (b=n) 

Size 

Power 

Size 

Power 

Size 

Power 

7  =  0 

7=2 

7i  =  -5 

7  =  0 

7=2 

7i  =  -5 

7  =  0 

7  =  .2            7i  =  .5 

n  =  100 

0.014 

0.348 

0.980 

0.026 

0.350 

0.954 

0.022 

0.316              0.968 

n  =  200 

0.052 

0.752 

1.000 

0.059 

0.728 

1.000 

0.038 

0.728              1.000 

n  =  300 

0.058 

0.910 

1.000 

0.058 

0.924 

1.000 

0.074 

0.918              1.000 

n  =  400 

0.054 

0.980 

1.000 

0.064 

0.978 

1.000 

0.056 

0.970              1.000 

maximal 

sim.  s.e. 

0.009 

0.009 

0.009 

Notes:  All  results  are  reproducible  and  the  programs  are  available  from  the  author. 


Table  3.  The  test  results  for  the  re-employment  bonus  treatment,  using 
b  =  3000  (  subsampling  with  replacement  ) 


Hypothesis 

Null 

Alternative 

Smirnov  Statistic 

5%  level  critical  value 

Decision 

Location-shift 
Location-scale  shift 
Dominance  Effect 

6(t)  =  S 

S(t)  =  a  +  7a(r) 

6{t)  <  0 

S(T)  ±  5 

S(t)  -fi  a  +  7q(t) 
3t:5{t)  >  0 

2.46 
2.47 
0.00 

1.31 
1.30 
4.59 

Reject 
Reject 
Accept 
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